Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations
نویسنده
چکیده
We show that the ‘naive Bayes’ classifier which assumes independent covariates greatly outperforms the Fisher linear discriminant rule under broad conditions when the number of variables grows faster than the number of observations, in the classical problem of discriminating between two normal populations. We also introduce a class of rules spanning the range between independence and arbitrary dependence. These rules are shown to achieve Bayes consistency for the Gaussian ‘coloured noise’ model and to adapt to a spectrum of convergence rates, which we conjecture to be minimax.
منابع مشابه
Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes
Most of the Bayesian network-based classifiers are usually only able to handle discrete variables. However, most real-world domains involve continuous variables. A common practice to deal with continuous variables is to discretize them, with a subsequent loss of information. This work shows how discrete classifier induction algorithms can be adapted to the conditional Gaussian network paradigm ...
متن کاملFisher’s Linear Discriminant Analysis for Weather Data by reproducing kernel Hilbert spaces framework
Recently with science and technology development, data with functional nature are easy to collect. Hence, statistical analysis of such data is of great importance. Similar to multivariate analysis, linear combinations of random variables have a key role in functional analysis. The role of Theory of Reproducing Kernel Hilbert Spaces is very important in this content. In this paper we study a gen...
متن کاملClassic and Bayes Shrinkage Estimation in Rayleigh Distribution Using a Point Guess Based on Censored Data
Introduction In classical methods of statistics, the parameter of interest is estimated based on a random sample using natural estimators such as maximum likelihood or unbiased estimators (sample information). In practice, the researcher has a prior information about the parameter in the form of a point guess value. Information in the guess value is called as nonsample information. Thomp...
متن کاملA New Classification Approach using Discriminant Functions
There are many algorithms for, and many applications of classification and discrimination (grouping of a set of objects into subsets of similar objects where the objects in different subsets are different) in several diverse fields [2-15, 23, 24], ranging from engineering to medicine, to econometrics, etc. Some examples are automatic target recognition (ATR), fault and maintenance-time recognit...
متن کاملAnalysis of sequential physiology data with weighted naive Bayes
In this project, I describe how I address the ICML 2004 Physiological Data Modeling Contest. For the gender prediction task, I compressed the large entry-based dataset to small session-based dataset and manually devised 90 features using a histogram method. Weighted naive Bayes (WNB) which is an extension of naive Bayes was applied and Markov Chain Monte Carlo was combined to solve the weight u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004